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Q ■ We study the effects of finite-precision representation of source's probabil- 

ities on the efficiency of classic source coding algorithms, such as Shannon, 
Gilbert-Moore, or arithmetic codes. In particular, we establish the following 
simple connection between the redundancy R and the number of bits W nec- 
essary for representation of source's probabilities in computer's memory (R is 
assumed to be small): 

Q . W < T] log 2 — , 

where m is the cardinality of the source's alphabet, and n ^ 1 is an implemen- 
tation-specific constant. In case of binary alphabets (m = 2) we show that 
there exist codes for which n = 1/2, and in m-ary case (m > 2) we show that 
| there exist codes for which n = m/(m + 1). In general case, however (which 

includes designs relying on progressive updates of frequency counters) , we show 
that n = 1. Usefulness of these results for practical designs of source coding 
£S) ■ algorithms is also discussed. 
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Since Shannon it is known that the average rate of a code constructed for a stochastic 
source cannot be lower than the entropy of this source. The difference between the 
rate and the entropy is called the redundancy of the code. 

It is also known, that the redundancy of a code fundamentally depends on the 
number of input symbols that are jointly mapped into a code during the encoding 
process. This number (say, n) is usually called a block-size or delay of the code. 
The ratio of redundancy over n is called the redundancy rate of a code, and the 
speed of its convergence has long been considered a key criterion in understanding 
the effectiveness of source codes. 

For example, it was shown that many classic codes for known memoryless sources 
(such as block Shannon, Huffman, or Gilbert-Moore codes [HE]) attain the redun- 
dancy rate of R = 0(l/n) [H [21 [91 [151 US] • Krichevsky-Trofimov codes for a class 
of memoryless sources achieve the rate of 0(log n/n) [SJ. Lempel-Ziv codes for this 
class were shown to converge at the rate of 0(loglogn/Iogn) [E9HB], etc. 

At the same time, much less is known about the connection between the redun- 
dancy and typical implementation constraints, such as width of computer's registers. 
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Perhaps the best example of a prior effort in exploring this connection was the de- 
velopment of finite-precision implementations of arithmetic codes (cf. Pasco |10| . 
Rubin [13] . Witten et al. [IZ])- Simple redundancy bounds for such finite-precision 
algorithms were offered by Han and Kobayashi [8] and Ryabko and Fionov |14j . 
However, both of these results are specific to particular implementations. 
In this paper, we pose and study somewhat more general questions: 

• "What is the achievable redundancy of a code that can be constructed by a 
machine with W -bits representations of source's probabilities?" , and/or 

• "What is the smallest number of bits that one should use for representing 
source's probabilities in order to construct a code with target redundancy R?" 

For simplicity, we assume that codes can use infinite delays, and that the redundancy 
is caused only by errors of finite precision representations of probabilities. 

Indeed, if the answer to the first question is known, then it can be treated as 
a lower bound for achievable redundancy of any source coding algorithm with such 
constraint on internal representation of source's probabilities (or their estimates). 
The answer to the second question is of immediate practical interest. 

This paper is organized as follows. In Section 2, we present formal setting of 
our problems and describe our results. Their possible applications are discussed in 
Section 3. All proofs and supplemental information from Diophantine approxima- 
tion theory (which we use in order to derive our results) are given in Appendix A. 



2 Definitions and Main Results 

Consider an m-ary memoryless source S, producing symbols a\,. . . , a m with prob- 
abilities pi, ■ ■ ■ ,p m . We assume that in our source coding algorithm instead of true 
probabilities pi, ■ ■ ■ , p m , we have to use their approximations: p±, . . . ,p m . 

By St = pi — pi (i = 1, . . . , m) we denote probability differences (errors) for each 
symbol, and by 

5* = max \pi — pi\ = max |<5j| , (1) 

i i 

we denote their maximum absolute value. Further, by p m \ n we denote the smallest 
probability in the source: 

p min = min pi (2) 

i 

and we will assume that p m i n > 0, and that it is also relatively large compared to 
our maximum approximation error: 

x* 

<1. (3) 



Pi, 

2.1 Upper Bounds for Redundancy 

We first show that, under above described conditions, the loss in compression effi- 
ciency incurred by using pi, ■ ■ ■ ,p m is bounded by the following simple expression. 

Lemma 1. 

D(p\\p) ^m5*— -L = mS* (1 + O (5*)) . (4) 

J- / Pmin 
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We next consider a more specific source coding scheme, in which source's prob- 
abilities are approximated by rational values: 

Pi = j, fi,teN (i = l,...,m) (5) 

where 



in 



t=Y,f* ( 6 ) 

1=1 

is the common denominator. 

We derive two results regarding attainable redundancy due to such approxima- 
tions. We start with a more general (and weaker) statement: 

Theorem 1. Given any f » 1, it is possible to find rational approximations of 
source's probabilities <[5|), such that: 

D ( p \\f/ t ) ^ m — J ™ h+o(t- 1 )) . (7) 

This shows that redundancy is decreasing approximately inverse proportionally 
to t. This bound is correct for any values t. 

Nevertheless, a more detailed study of this problem (which involves the use of 
tools and results from Diophantine approximation theory [3]) reveals that among 
various possible values of denominator t, there exists some, for which precision of 
approximation (0) can be much higher. In turn, this leads to much lower redundancy 
of source codes based on such approximations. 

We claim the following: 

Theorem 2. There exist infinitely many integers t,fi,...,f m {m > 2) producing 
approximations of source 's probabilities ([IJj ; such that: 

D(j,|l//1) < t^ T^TiW^-) '^ (i + o («--■>.)) . (8) 

In a case when m = 2, there exist infinitely many integers t, f\ (with fi = t— fx) 
producing approximations of source 's probabilities (0), such that: 



where: 



K 



5 1/2 , if pi = ip = , rv-us = ±1, r,s,u,v G Z, 

2 -3 / 2 , otherwise . 



The above result is not immediately obvious, as it implies that codes relying on 
rational approximations © and constructed for binary sources, can be much more 
precise than codes using equally large parameter t but constructed for m-ary sources 
(m > 2). In general, based on the above result, the larger is the cardinality of the 
alphabet, the more severe is the effect of approximations of source's probabilities. 

Recall, that most traditional redundancy bounds for codes for memoryless sources 
(such as delay-redundancy relations, obtained assuming infinite precision implemen- 
tation) don't change their order based on cardinality of the alphabet. Our finding 
above indicates that for many practical algorithms this may no longer be a case 
once one start accounting redundancy contributions caused by finite-precision im- 
plementation. 
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2.2 Precision (and Memory Usage) vs Redundancy 

Consider an implementation of a source coding algorithm employing rational ap- 
proximations of source's probabilities ©. We assume that such an algorithm can 
either store immediate values /j, or their cumulative equivalents (as it is convenient, 
e.g. in the design of Shannon, Gilbert-Moore, or arithmetic codes): 

si= E fa- ( u ) 

3 = 

{k-j} : p kl < . . . < p& m 

In either case, the maximum integer value that might be stored per symbol is t. 
Hence, the required register width or memory cost per symbol W in such a scheme 
is simply: 

W=\log 2 t]. (12) 
Using our prior results, we derive the following simple bounds for this quantity: 

Corollary 1. In the design of a source coding scheme, the number of bits W used 
for representing source's probabilities, and the resulting increase in redundancy R 
satisfy: 

r + ^J =,o ^(r) + 0{r) (13) 

Corollary 2. There exist infinitely many implementations of source codes, in which 
the number of bits W used for representing source 's probabilities and the resulting 
increase in redundancy R satisfy: 

w < —i log * (i + d + 1 = ^r ( r) + 1 + 0(ii » • (14) 

where m > 2 is a cardinality of source 's alphabet. 

In a case when m = 2, there exist infinitely many implementations of binary 
source codes, in which the number of bits W used for representing source 's probabil- 
ities and the resulting increase in redundancy R satisfy: 

W < 5 lofe (! + + \ lo& 4K = \ lo& (t) + ° {R} ■ (15) 

where k is a constant defined in MU\) . 

Here, it can be seen that the use of binary alphabets can lead up to a factor of 2 
savings in the number of bits needed in representation of probabilities. 

It should be noted, however, that this factor of 2 does not implies that con- 
version to binary alphabets will produce equivalent saving in overall storage usage. 
The problem here is that in order to implement such codes one need to store not 
only values f\ or s±, but also fi or s 2 or t. For example, if one would substitute an 
m-ary source with a cascade (binary tree) of m — 1 binary sources, then this would 
increase memory usage by a factor of 2(m — l)/m, prior to a possibility to use gains 
predicted by our Corollary 2. Nevertheless, such a conversion can still be justified 
by the need to use shorter registers. 
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In general, given an m-ary source (m ^ 2), we can say that the amount of 
memory M that needs to be use for construction of its code (that involves rational 
representation of its probabilities is at least 

M = mW, (16) 

where W is connected to redundancy as predicted by our Corollaries 1 and 2. 

3 Conclusions 

Several bounds establishing connection between redundancy and precision of repre- 
sentation of probabilities in source coding algorithms have been derived. 

These results can be used for maximizing performance of codes with respect to 
precision or memory available on each particular computing platform. 

The results of our Theorem 2 (and Corollary 2) revealing the existence of higher- 
precision approximations, particularly for sources with small cardinality of alphabets 
may also influence future practical designs of source coding algorithms. For example, 
in handling of sources with large alphabets one may consider grouping of their 
symbols and using a cascade of codes with smaller alphabets as a way to improve 
their performance given register-width constraints. 
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A Proofs 

We first prove statement of our Lemma 1: 

d (p\\p) = y^Kiogf^) 

m 

= -^2pi^og(l - Si/pi) 
1=1 

m 

< -J>log(l-<TM) 

i=i 

< S>T3^ (17) 

i=i 

m 

i=i 
= m5* 



5*i 'pi 
5* 

^ 1 - 5*/ Pmin 
1 



1 - 5*/Pmh 
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where (fT7|) is due to the following inequality pQ: 

- log(l - x) ^ (x < 1). 

1 — X 

We next consider rational approximations © of source probabilities. By assum- 
ing that factors fa are chosen such that 

1 1 

Si = \Pi ~ fa/t\ = - \tpi - fa\ = - min \tpi - z\ , 

t t Z&Jj 

we can show that: 

Si < — , 
2t 

and, consequently: 

8* < — . 
2t 

This, combined with Lemma 1 leads to a statement of our Theorem 1. 

In order to prove first statement of Theorem 2, we will need to use the following 
result regarding attainable precision of simultaneous Diophantine approximations 
[3J p. 14, Theorem III]: 

Fact 1. For any n > 1 irrational values ct\, . . . , a n , there exist infinitely many sets 
of integers (ai, . . . , a n , (?) suc/i that: 

,1/n. - ■ *» 



g /n max{|gai - a a | , . . . , \qa n - a n \} < 



n+1 

In our case, this means that there must exist rational approximations of set of 
probabilities Pi, ■ ■ ■ ,p m , such that: 

6 * < _JH_ t -l-l/ m <r l-l/m_ 

m + 1 

This, combined with Lemma 1, leads to bound (JSj) claimed by the Theorem 2. 

Indeed, it shall be noted, that the task of finding such Diophantine approxima- 
tions is not a trivial one, and the reader is referred to a book of M. Groetschel, L. 
Lovacz, and A. Schrijver [7] which discusses this problem in details. 

Consider now binary case (m = 2). We first notice that since p\ + P2 = 1, the 
problem is essentially reduced to studying precision of a single approximation: 

pi = hit- (A,t€N) 

Thus, by setting p2 = /2A = (t — fi)/t we can see that: 

1^1 = |p 2 - h/t\ = \l-px - (t-fa)/t\ = \ Pl - fa/t\ = \8 X \ = 5* . 

Hence, here we are dealing with a scalar (one-dimensional) case of Diophantine 
approximations, and the following result apply [3~1 p. 11, Theorem V]: 

Fact 2. Let a be irrational. Then there are infinitely many q and a such that 

1 1 — 1 /2 

q \qa — a\ < 5 ' . 

If a is equivalent to \ (y/E— l) then the constant 5" 1 / 2 cannot be replaced by any 
smaller constant. If a is not equivalent to | (v5 — l), then there are infinitely many 
q and a such that: 

q\qa - a\ < 2~ 3/2 . 
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This means, that in our case with probabilities pi,P2 = 1 — Pi, there must exist 
approximations for which: 

5* < Kt~ 2 , 

where k is a constant depending on p\ in the following way (which absorbs definition 
of equivalence implied by above quoted result): 



K 



5~ 1/2 , if Pi = 5S, ^ = ^1, rv-us = ±l, r,s,u,vGZ, 
2~ 3 / 2 , otherwise . 



By combining this with Lemma 1 we arrive at the second bound ([9]) claimed by 
the Theorem 2. 

Our Corollaries 1, and 2 are simple consequences of Theorems 1 and 2, in which 
we use inequality: 

log 2 t ^ W < log 2 t + 1. 
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